{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 04 - 图因果模型\n",
    "\n",
    "\n",
    "## 思考因果关系\n",
    "\n",
    "你有没有注意到 YouTube 视频中的那些厨师是如何出色地描述食物的？ “减少酱汁，直到达到天鹅绒般的稠度”。如果您刚刚开始学习如何烹饪，您甚至可能都不知道这意味着什么。给我点时间让我娓娓道来！对于因果关系，这是一回事。如果你走进一家酒吧，听到人们讨论因果关系（可能是经济系旁边的一家酒吧），你会听到他们讨论在确认移民对一个社区的影响的时候，收入这个混淆因子如何使得分析变得困难，所以他们不得不使用工具变量。到现在为止，您可能不明白他们在说什么。不过，我现在会开始试图解决一部分问题。\n",
    "\n",
    "图形模型是因果关系的语言。它们不仅是您用来与其他勇敢而真实的因果关系爱好者交谈的工具，也是您用来使自己的想法更清晰的工具。\n",
    "\n",
    "让我们以潜在结果的条件独立性为第一个例子来展开讲解。这个条件是我们在进行因果推断时需要为真的主要假设之一：\n",
    "\n",
    "$\n",
    "(Y_0, Y_1) \\perp T | X\n",
    "$\n",
    "\n",
    "条件独立性使我们有可能衡量完全由于干预对结果产生的影响，而不是任何其他潜伏在周围的变量。一个典型的例子是药物对病人的影响。如果只有重病患者才能服用这种药物，那么服用这种药物甚至可能会降低患者的健康。那是因为严重程度的影响与药物的影响混淆了。但是，如果我们将患者按重症和非重症病例分类，并分析每个子分类小组中的药物影响，我们将更清楚地了解真正的效果是什么。这种按特征拆解样本的方法就是我们所说的对 X 的控制或调节。通过对严重病例进行调节，治疗机制变得和随机一样好。严重组中的患者可能会或可能不会仅仅因为偶然而接受药物，而不是因为高严重性，因为所有患者在这个维度上都是相同的。如果治疗就像在组内随机分配一样，则治疗有条件地独立于潜在结果。\n",
    "\n",
    "独立性和条件独立性是因果推理的核心。然而，仅仅围绕这些概念展开会让学习很困难。另一方面，如果我们使用正确的语言来描述这个问题，会让困难减轻一点。这就是**因果图模型**的用武之地。因果图模型是一种表示因果关系如何起作用的方式，即是什么导致了什么。\n",
    "\n",
    "图模型如下所示"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "In Anaconda environment, use the following command to successfully install and import graphviz:\n",
    "\n",
    ">conda install -c anaconda graphviz python-graphviz\n",
    ">python -m \"import graphviz as g\"\n",
    "'''\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import graphviz as gr\n",
    "from matplotlib import style\n",
    "import seaborn as sns\n",
    "from matplotlib import pyplot as plt\n",
    "style.use(\"fivethirtyeight\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"296pt\" height=\"188pt\"\r\n",
       " viewBox=\"0.00 0.00 296.20 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 292.196,-184 292.196,4 -4,4\"/>\r\n",
       "<!-- Z -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>Z</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Z</text>\r\n",
       "</g>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- Z&#45;&gt;X -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>Z&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M27,-143.697C27,-135.983 27,-126.712 27,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"30.5001,-118.104 27,-108.104 23.5001,-118.104 30.5001,-118.104\"/>\r\n",
       "</g>\r\n",
       "<!-- U -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>U</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">U</text>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;X -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>U&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M84.4297,-146.834C74.2501,-136.938 60.4761,-123.546 48.9694,-112.359\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"51.4055,-109.846 41.7957,-105.385 46.5259,-114.865 51.4055,-109.846\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;Y -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>U&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M99,-143.697C99,-135.983 99,-126.712 99,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"102.5,-118.104 99,-108.104 95.5001,-118.104 102.5,-118.104\"/>\r\n",
       "</g>\r\n",
       "<!-- A -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>A</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"138\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"138\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">A</text>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;A -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>U&#45;&gt;A</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M112.587,-146.017C120.733,-136.028 130.38,-122.171 135,-108 141.5,-88.0656 141.955,-64.2741 140.945,-46.3069\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"144.427,-45.9291 140.193,-36.2172 137.446,-46.4499 144.427,-45.9291\"/>\r\n",
       "</g>\r\n",
       "<!-- Y&#45;&gt;A -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>Y&#45;&gt;A</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M108.045,-72.7646C112.82,-64.1948 118.782,-53.494 124.127,-43.9004\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"127.194,-45.5865 129.004,-35.1473 121.079,-42.1795 127.194,-45.5865\"/>\r\n",
       "</g>\r\n",
       "<!-- 药物 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>药物</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"217\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"217\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">药物</text>\r\n",
       "</g>\r\n",
       "<!-- 存活 -->\r\n",
       "<g id=\"node7\" class=\"node\"><title>存活</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"233\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"233\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">存活</text>\r\n",
       "</g>\r\n",
       "<!-- 药物&#45;&gt;存活 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>药物&#45;&gt;存活</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M220.873,-72.055C222.655,-64.2609 224.812,-54.8219 226.811,-46.0789\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"230.235,-46.8038 229.051,-36.2753 223.411,-45.244 230.235,-46.8038\"/>\r\n",
       "</g>\r\n",
       "<!-- 严重程度 -->\r\n",
       "<g id=\"node8\" class=\"node\"><title>严重程度</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"244\" cy=\"-162\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"244\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">严重程度</text>\r\n",
       "</g>\r\n",
       "<!-- 严重程度&#45;&gt;药物 -->\r\n",
       "<g id=\"edge8\" class=\"edge\"><title>严重程度&#45;&gt;药物</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M237.464,-144.055C234.38,-136.059 230.628,-126.331 227.183,-117.4\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"230.34,-115.859 223.476,-107.789 223.809,-118.379 230.34,-115.859\"/>\r\n",
       "</g>\r\n",
       "<!-- 严重程度&#45;&gt;存活 -->\r\n",
       "<g id=\"edge7\" class=\"edge\"><title>严重程度&#45;&gt;存活</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M248.575,-143.825C252.711,-125.816 257.515,-96.7505 253,-72 251.326,-62.8208 248.145,-53.1658 244.805,-44.6485\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"247.973,-43.1502 240.892,-35.2692 241.513,-45.8458 247.973,-43.1502\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea1384108>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"Z\", \"X\")\n",
    "g.edge(\"U\", \"X\")\n",
    "g.edge(\"U\", \"Y\")\n",
    "g.edge('Y', 'A')\n",
    "g.edge('U', 'A')\n",
    "\n",
    "g.edge(\"药物\", \"存活\")\n",
    "g.edge(\"严重程度\", \"存活\")\n",
    "g.edge(\"严重程度\", \"药物\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "每个节点都是一个随机变量。我们使用箭头或边来显示一个变量是否会导致另一个变量的变化。在上面的第一个图形模型中，我们说 Z 导致 X，U 导致 X 和 Y。举一个更具体的例子，我们可以将药物对患者生存影响的想法转化为上面的第二个图形。疾病严重程度会同时导致服药的干预和病人的存活，服药本身也带来病人的存活。正如我们将看到的，这种因果图模型语言将帮助我们更清晰地思考因果关系，因为它明确了我们对世界如何运作的信念。\n",
    "\n",
    "## 图模型快速入门\n",
    "\n",
    "图模型本身可能需要一整个学期的[课程](https://www.coursera.org/specializations/probabilistic-graphical-models)。但是，就我们的目的而言，了解图模型需要什么样的独立性和条件独立性假设是（非常）重要的。正如我们将看到的，独立性通过图形模型展开，就像水流过溪流一样。我们可以停止这个流程，也可以启用它，这取决于我们如何处理其中的变量。为了理解这一点，让我们检查一些常见的图形结构和示例。它们非常简单，但这些基本模块足以帮助我们理解图模型上有关独立性和条件独立性的所有内容。\n",
    "\n",
    "首先，看看这个非常简单的图。 A导致B，B导致C。或者X导致Y导致Z。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"272pt\" height=\"188pt\"\r\n",
       " viewBox=\"0.00 0.00 272.44 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 268.445,-184 268.445,4 -4,4\"/>\r\n",
       "<!-- A -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>A</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">A</text>\r\n",
       "</g>\r\n",
       "<!-- B -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>B</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">B</text>\r\n",
       "</g>\r\n",
       "<!-- A&#45;&gt;B -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>A&#45;&gt;B</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M27,-143.697C27,-135.983 27,-126.712 27,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"30.5001,-118.104 27,-108.104 23.5001,-118.104 30.5001,-118.104\"/>\r\n",
       "</g>\r\n",
       "<!-- C -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>C</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">C</text>\r\n",
       "</g>\r\n",
       "<!-- B&#45;&gt;C -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>B&#45;&gt;C</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M27,-71.6966C27,-63.9827 27,-54.7125 27,-46.1124\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"30.5001,-46.1043 27,-36.1043 23.5001,-46.1044 30.5001,-46.1043\"/>\r\n",
       "</g>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"99\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;Y -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>X&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M99,-143.697C99,-135.983 99,-126.712 99,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"102.5,-118.104 99,-108.104 95.5001,-118.104 102.5,-118.104\"/>\r\n",
       "</g>\r\n",
       "<!-- Z -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>Z</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Z</text>\r\n",
       "</g>\r\n",
       "<!-- Y&#45;&gt;Z -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>Y&#45;&gt;Z</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M99,-71.6966C99,-63.9827 99,-54.7125 99,-46.1124\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"102.5,-46.1043 99,-36.1043 95.5001,-46.1044 102.5,-46.1043\"/>\r\n",
       "</g>\r\n",
       "<!-- 因果推断知识 -->\r\n",
       "<g id=\"node7\" class=\"node\"><title>因果推断知识</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"204\" cy=\"-162\" rx=\"60.3893\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"204\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">因果推断知识</text>\r\n",
       "</g>\r\n",
       "<!-- 解决业务问题 -->\r\n",
       "<g id=\"node8\" class=\"node\"><title>解决业务问题</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"204\" cy=\"-90\" rx=\"60.3893\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"204\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">解决业务问题</text>\r\n",
       "</g>\r\n",
       "<!-- 因果推断知识&#45;&gt;解决业务问题 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>因果推断知识&#45;&gt;解决业务问题</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M204,-143.697C204,-135.983 204,-126.712 204,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"207.5,-118.104 204,-108.104 200.5,-118.104 207.5,-118.104\"/>\r\n",
       "</g>\r\n",
       "<!-- 职位晋升 -->\r\n",
       "<g id=\"node9\" class=\"node\"><title>职位晋升</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"204\" cy=\"-18\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"204\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">职位晋升</text>\r\n",
       "</g>\r\n",
       "<!-- 解决业务问题&#45;&gt;职位晋升 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>解决业务问题&#45;&gt;职位晋升</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M204,-71.6966C204,-63.9827 204,-54.7125 204,-46.1124\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"207.5,-46.1043 204,-36.1043 200.5,-46.1044 207.5,-46.1043\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea13720c8>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"A\", \"B\")\n",
    "g.edge(\"B\", \"C\")\n",
    "\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.edge(\"Y\", \"Z\")\n",
    "g.node(\"Y\", \"Y\", color=\"red\")\n",
    "\n",
    "\n",
    "g.edge(\"因果推断知识\", \"解决业务问题\")\n",
    "g.edge(\"解决业务问题\", \"职位晋升\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在第一个图中，依赖性沿箭头方向流动。举一个更具体的例子，假设懂得因果推断的知识是解决业务问题的唯一途径，而解决这些问题是获得工作晋升的唯一途径。因此，因果知识导致问题解决，从而导致工作晋升。在这里我们可以说职位晋升依赖于因果知识。因果知识越多，获得晋升的机会就越大。请注意，依赖性是对称的，尽管它不太直观。你升职的机会越大，反过来说明你有因果推断知识的概率就越大，否则就很难升职。\n",
    "\n",
    "现在，假设我以中间变量为条件。在这种情况下，依赖被阻塞。因此，给定 Y，X 和 Z 是独立的。在上图中，红色表示 Y 是一个条件变量。出于同样的原因，在我们的示例中，如果我知道您擅长解决问题，那么知道您知道因果推理并不会提供有关您获得工作晋升机会的任何进一步信息。在数学上，\\\\(E[Promotion|Solve \\ questions, Causal \\ knowledge]=E[Promotion|Solve \\ questions]\\\\)。反之亦然，一旦我知道你在解决问题方面有多擅长，了解你的工作晋升状态不会让我进一步了解你知道因果推断的可能性。\n",
    "\n",
    "作为一般规则，当我们以中间变量 C 为条件时，从 A 到 B 的直接路径中的依赖流被阻塞。或者，\n",
    "\n",
    "$A \\not\\!\\perp\\!\\!\\!\\perp B$\n",
    "\n",
    "和\n",
    "\n",
    "$\n",
    "A \\!\\perp\\!\\!\\!\\perp B | C\n",
    "$\n",
    "\n",
    "现在，让我们考虑一个分叉结构。在这种情况下，同一个变量会导致图中的其他两个变量。在这种情况下，依赖关系通过箭头向后流动，我们有所谓的**后门路径**。我们可以通过以共同原因为条件来关闭后门路径并关闭依赖。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"490pt\" height=\"116pt\"\r\n",
       " viewBox=\"0.00 0.00 490.20 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 486.196,-112 486.196,4 -4,4\"/>\r\n",
       "<!-- C -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>C</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"63\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"63\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">C</text>\r\n",
       "</g>\r\n",
       "<!-- A -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>A</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">A</text>\r\n",
       "</g>\r\n",
       "<!-- C&#45;&gt;A -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>C&#45;&gt;A</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.6504,-72.7646C50.2885,-64.2831 44.8531,-53.7144 39.9587,-44.1974\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"42.9904,-42.4395 35.3043,-35.1473 36.7654,-45.6409 42.9904,-42.4395\"/>\r\n",
       "</g>\r\n",
       "<!-- B -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>B</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">B</text>\r\n",
       "</g>\r\n",
       "<!-- C&#45;&gt;B -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>C&#45;&gt;B</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M71.3496,-72.7646C75.7115,-64.2831 81.1469,-53.7144 86.0413,-44.1974\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"89.2346,-45.6409 90.6957,-35.1473 83.0096,-42.4395 89.2346,-45.6409\"/>\r\n",
       "</g>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"207\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"171\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;Y -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>X&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M198.65,-72.7646C194.288,-64.2831 188.853,-53.7144 183.959,-44.1974\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"186.99,-42.4395 179.304,-35.1473 180.765,-45.6409 186.99,-42.4395\"/>\r\n",
       "</g>\r\n",
       "<!-- Z -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>Z</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"243\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Z</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;Z -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>X&#45;&gt;Z</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M215.35,-72.7646C219.712,-64.2831 225.147,-53.7144 230.041,-44.1974\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"233.235,-45.6409 234.696,-35.1473 227.01,-42.4395 233.235,-45.6409\"/>\r\n",
       "</g>\r\n",
       "<!-- 统计学 -->\r\n",
       "<g id=\"node7\" class=\"node\"><title>统计学</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"385\" cy=\"-90\" rx=\"35.9954\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"385\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">统计学</text>\r\n",
       "</g>\r\n",
       "<!-- 因果推断 -->\r\n",
       "<g id=\"node8\" class=\"node\"><title>因果推断</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"332\" cy=\"-18\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"332\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">因果推断</text>\r\n",
       "</g>\r\n",
       "<!-- 统计学&#45;&gt;因果推断 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>统计学&#45;&gt;因果推断</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M372.708,-72.7646C366.18,-64.143 358.019,-53.3647 350.722,-43.7267\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"353.421,-41.4933 344.594,-35.6334 347.84,-45.7187 353.421,-41.4933\"/>\r\n",
       "</g>\r\n",
       "<!-- 机器学习 -->\r\n",
       "<g id=\"node9\" class=\"node\"><title>机器学习</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"438\" cy=\"-18\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"438\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">机器学习</text>\r\n",
       "</g>\r\n",
       "<!-- 统计学&#45;&gt;机器学习 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>统计学&#45;&gt;机器学习</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M397.292,-72.7646C403.82,-64.143 411.981,-53.3647 419.278,-43.7267\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"422.16,-45.7187 425.406,-35.6334 416.579,-41.4933 422.16,-45.7187\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea137e788>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"C\", \"A\")\n",
    "g.edge(\"C\", \"B\")\n",
    "\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.edge(\"X\", \"Z\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"统计学\", \"因果推断\")\n",
    "g.edge(\"统计学\", \"机器学习\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "看下面这个例子。假设您对统计学的了解使您对因果推理和机器学习都有了更多的了解。在我不知道你的统计知识水平的情况下，如果知道你擅长因果推理，就能推断更有可能你也擅长机器学习。这是因为，即使我不知道你的统计知识水平，我也可以从你的因果推理知识中推断出来：如果你擅长因果推理，你可能擅长统计学，这也让你更有可能擅长在机器学习方面。\n",
    "\n",
    "现在，如果我以你对统计学的知识为已知条件，那么你对机器学习的了解程度将与你对因果推理的了解程度无关。你看，知道你的统计水平已经为我提供了推断你的机器学习技能水平所需的所有信息。在这种情况下，了解您的因果推理水平不会提供更多的信息。\n",
    "\n",
    "作为一般规则，共享一个同一个原因的两个变量是相关的，但当我们以共同原因为条件时它们则是独立的。数学上可以表示为：\n",
    "\n",
    "$A \\not\\!\\perp\\!\\!\\!\\perp B$\n",
    "\n",
    "和\n",
    "\n",
    "$\n",
    "A \\!\\perp\\!\\!\\!\\perp B | C\n",
    "$\n",
    "\n",
    "唯一缺少的结构是受冲突因子（collider，有部分中文资料将其直接翻译为对撞机，感觉比较生硬）。当两个箭头在单个变量上相遇，我们就称这种现象为冲突。我们可以说，在这种情况下，两个变量共享一个共同的效果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"B\", \"C\")\n",
    "g.edge(\"A\", \"C\")\n",
    "\n",
    "g.edge(\"Y\", \"X\")\n",
    "g.edge(\"Z\", \"X\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"statistics\", \"job promotion\")\n",
    "g.edge(\"flatter\", \"job promotion\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "例如，考虑有两种方法可以得到工作晋升。你要么擅长统计，要么奉承你的老板。如果我不以你的职位晋升为条件，也就是说，我不知道你是否会得到或不会得到，那么你的统计和奉承水平是独立的。换句话说，知道你在统计方面有多好，我无法说明你在奉承老板方面有多好。另一方面，如果您确实获得了工作晋升，那么突然之间，了解您的统计数据水平会告诉我您的奉承程度。如果你不擅长统计并且确实得到了晋升，那么你更有可能知道如何奉承，否则你就不会得到晋升。相反，如果你不擅长奉承，那一定是你擅长统计。这种现象有时被称为 **explaining away**，因为一个原因已经解释了结果，使另一个原因不太可能。\n",
    "\n",
    "作为一般规则，对碰撞器的调节会打开依赖路径。不调节它会使其关闭。或者\n",
    "\n",
    "$A \\!\\perp\\!\\!\\!\\perp B$\n",
    "\n",
    "和\n",
    "\n",
    "$\n",
    "A \\not\\!\\perp\\!\\!\\!\\perp B | C\n",
    "$\n",
    "\n",
    "知道了这三种结构，我们可以推导出更一般的规则。一条路径被阻塞当且仅当：\n",
    "1. 它包含一个被条件化的非冲突因子\n",
    "2. 它包含一个没有被条件化的冲突因子并且没有被条件化的后代。\n",
    "\n",
    "这是一个关于依赖如何在图中流动的备忘单。我摘自 Mark Paskin 的 [斯坦福演讲](http://ai.stanford.edu/~paskin/gm-short-course/lec2.pdf)。尖端带线的箭头表示独立，尖端不带线的箭头表示依赖。\n",
    "\n",
    "![img](data/img/graph-flow.png)\n",
    "\n",
    "作为最后一个例子，试着在下面的因果图中找出一些独立和依赖关系。\n",
    "1. 是 \\\\(D \\perp C\\\\) 吗？\n",
    "2. 是 \\\\(D \\perp C| A \\\\) 吗？\n",
    "3. 是 \\\\(D \\perp C| G \\\\) 吗？\n",
    "4. 是 \\\\(A \\perp F \\\\) 吗？\n",
    "5. 是 \\\\(A \\perp F|E \\\\) 吗？\n",
    "6. 是 \\\\(A \\perp F|E,C \\\\) 吗？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"C\", \"A\")\n",
    "g.edge(\"C\", \"B\")\n",
    "g.edge(\"D\", \"A\")\n",
    "g.edge(\"B\", \"E\")\n",
    "g.edge(\"F\", \"E\")\n",
    "g.edge(\"A\", \"G\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**答案**：\n",
    "1. \\\\(D \\perp C\\\\)。它包含一个 **未** 对其进行调节的冲突因子。\n",
    "2. \\\\(D \\not\\perp C| A \\\\)。它包含一个对其进行调节的冲突因子。\n",
    "3. \\\\(D \\not\\perp C| G \\\\)。它包含一个受条件限制的冲突因子的后代。您可以在此处将 G 视为 A 的某种代理。\n",
    "4. \\\\(A \\perp F \\\\)。它包含一个对撞机，B->E<-F，它**没有**受到限制。\n",
    "5. \\\\(A \\not\\perp F|E \\\\)。它包含一个碰撞器，B->E<-F，它已被调节。\n",
    "6. \\\\(A \\perp F|E, C \\\\)。它包含一个冲突因子，B->E<-F，它被条件化了，但它包含一个非冲突因子，它已经被条件化了。 E 上的条件化打开了路径，但 C 上的条件化再次关闭了它。\n",
    "\n",
    "了解因果图模型使我们能够理解因果推理中出现的问题。正如我们所见，问题总是归结为偏差。\n",
    "\n",
    "$\n",
    "E[Y|T=1] - E[Y|T=0] = \\underbrace{E[Y_1 - Y_0|T=1]}_{ATET} + \\underbrace{\\{ E[Y_0|T=1] - E[Y_0|T=0] \\}}_{BIAS}\n",
    "$\n",
    "\n",
    "图形模型使我们能够诊断我们正在处理的偏见以及我们需要纠正它们的工具是什么。\n",
    "\n",
    "## 混淆偏差\n",
    "\n",
    "![img](./data/img/causal-graph/both_crap.png)\n",
    "\n",
    "偏差（bias）的第一个主要原因是混淆（confounding）。当干预和结果有共同的原因时，就会发生这种情况。例如，假设干预是接受教育，结果是收入，这时很难知道接受教育对工资的因果关系，因为两者都有一个共同的原因：智力。所以我们可以提出这样的论点，即受教育程度越高的人赚的钱越多，仅仅是因为他们更聪明，而不是因为他们受过更多的教育。为了确定因果效应，我们需要关闭干预和结果之间的所有后门路径。如果我们这样做，剩下的唯一效果就是直接效果 T->Y。在我们的例子中，如果我们控制智力，即我们比较智力水平相同但受教育程度不同的人，结果的差异将仅是由于受教育程度的不同，因为比较中的每个人其智力是一样的。为了解决混淆偏差，我们需要控制治疗和结果的所有常见原因。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"207pt\" height=\"188pt\"\r\n",
       " viewBox=\"0.00 0.00 207.30 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 203.298,-184 203.298,4 -4,4\"/>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"66\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"66\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- T -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>T</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">T</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;T -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>X&#45;&gt;T</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M56.9546,-144.765C52.18,-136.195 46.2181,-125.494 40.8731,-115.9\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"43.9209,-114.18 35.9964,-107.147 37.8059,-117.586 43.9209,-114.18\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"46\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"46\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;Y -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>X&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M66.8179,-143.794C67.3501,-125.761 67.2876,-96.6757 63,-72 61.4711,-63.201 58.8304,-53.8474 56.0799,-45.4865\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"59.3306,-44.1781 52.7379,-35.8843 52.7195,-46.4791 59.3306,-44.1781\"/>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;Y -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>T&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M31.5994,-72.055C33.7465,-64.1446 36.3535,-54.5398 38.7561,-45.6879\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"42.2012,-46.3567 41.443,-35.789 35.4456,-44.523 42.2012,-46.3567\"/>\r\n",
       "</g>\r\n",
       "<!-- 智力 -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>智力</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"161\" cy=\"-162\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"161\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">智力</text>\r\n",
       "</g>\r\n",
       "<!-- 教育 -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>教育</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"145\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"145\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;教育 -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>智力&#45;&gt;教育</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M157.127,-144.055C155.345,-136.261 153.188,-126.822 151.189,-118.079\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"154.589,-117.244 148.949,-108.275 147.765,-118.804 154.589,-117.244\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>工资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"172\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"172\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">工资</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;工资 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>智力&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M168.892,-144.731C173.426,-134.463 178.671,-120.771 181,-108 184.762,-87.3769 182.054,-63.7578 178.654,-46.036\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"181.992,-44.9029 176.499,-35.8432 175.143,-46.351 181.992,-44.9029\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育&#45;&gt;工资 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>教育&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M151.399,-72.411C154.514,-64.3352 158.334,-54.4312 161.835,-45.3547\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"165.126,-46.5458 165.46,-35.9562 158.595,-44.0267 165.126,-46.5458\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea13aaa48>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"X\", \"T\")\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.node('Y', 'Y', color='red')\n",
    "\n",
    "g.edge(\"智力\", \"教育\"),\n",
    "g.edge(\"智力\", \"工资\"),\n",
    "g.edge(\"教育\", \"工资\")\n",
    "g.node(\"工资\", \"工资\", color='red')\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不幸的是，并非总是可以控制所有常见原因。 有时，存在我们无法衡量的未知原因或已知原因。 智力的情况是后者之一。 尽管付出了很多努力，但科学家们还没有弄清楚如何很好地衡量智力。 我将在这里使用 U 表示未测量的变量。 现在，假设智力不能直接影响你的教育。 它只会影响你在 SAT 上的表现，但 SAT 决定了你的教育水平，因为它开启了进入一所好大学的可能性。 即使我们无法控制不可测量的情报，我们也可以控制 SAT 并关闭后门路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"380pt\" height=\"260pt\"\r\n",
       " viewBox=\"0.00 0.00 380.30 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-256 376.298,-256 376.298,4 -4,4\"/>\r\n",
       "<!-- X1 -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>X1</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X1</text>\r\n",
       "</g>\r\n",
       "<!-- T -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>T</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"95\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"95\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">T</text>\r\n",
       "</g>\r\n",
       "<!-- X1&#45;&gt;T -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>X1&#45;&gt;T</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M41.0896,-146.496C50.5719,-136.735 63.2443,-123.69 73.9201,-112.7\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"76.471,-115.097 80.9284,-105.485 71.45,-110.22 76.471,-115.097\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"95\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"95\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- X1&#45;&gt;Y -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>X1&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M31.6944,-144.076C37.086,-125.774 46.7826,-95.9975 59,-72 64.0718,-62.0378 70.7097,-51.7293 76.9024,-42.8817\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.8708,-44.7488 82.8722,-34.588 74.1895,-40.6594 79.8708,-44.7488\"/>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;Y -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>T&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M95,-71.6966C95,-63.9827 95,-54.7125 95,-46.1124\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"98.5001,-46.1043 95,-36.1043 91.5001,-46.1044 98.5001,-46.1043\"/>\r\n",
       "</g>\r\n",
       "<!-- X2 -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>X2</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"99\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X2</text>\r\n",
       "</g>\r\n",
       "<!-- X2&#45;&gt;T -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>X2&#45;&gt;T</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M98.0112,-143.697C97.5704,-135.983 97.0407,-126.712 96.5493,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"100.042,-117.888 95.9774,-108.104 93.0537,-118.288 100.042,-117.888\"/>\r\n",
       "</g>\r\n",
       "<!-- U -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>U</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"127\" cy=\"-234\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"127\" y=\"-230.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">U</text>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;Y -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>U&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M131.157,-215.975C137.68,-185.958 147.928,-122.06 131,-72 127.419,-61.4101 121.101,-50.9667 114.748,-42.1869\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"117.345,-39.8225 108.482,-34.0178 111.791,-44.0827 117.345,-39.8225\"/>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;X2 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>U&#45;&gt;X2</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M120.364,-216.411C117.087,-208.216 113.056,-198.14 109.382,-188.955\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"112.552,-187.455 105.588,-179.47 106.052,-190.055 112.552,-187.455\"/>\r\n",
       "</g>\r\n",
       "<!-- 家庭收入 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>家庭收入</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"226\" cy=\"-162\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"226\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">家庭收入</text>\r\n",
       "</g>\r\n",
       "<!-- 教育 -->\r\n",
       "<g id=\"node7\" class=\"node\"><title>教育</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"308\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"308\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育</text>\r\n",
       "</g>\r\n",
       "<!-- 家庭收入&#45;&gt;教育 -->\r\n",
       "<g id=\"edge7\" class=\"edge\"><title>家庭收入&#45;&gt;教育</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M244.198,-145.465C255.887,-135.486 271.202,-122.413 283.897,-111.576\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"286.463,-113.987 291.796,-104.832 281.918,-108.663 286.463,-113.987\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资 -->\r\n",
       "<g id=\"node8\" class=\"node\"><title>工资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"308\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"308\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">工资</text>\r\n",
       "</g>\r\n",
       "<!-- 家庭收入&#45;&gt;工资 -->\r\n",
       "<g id=\"edge10\" class=\"edge\"><title>家庭收入&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M233.9,-144.276C242.636,-126.144 257.411,-96.5121 272,-72 277.803,-62.2503 284.696,-51.856 290.891,-42.8786\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"293.922,-44.6505 296.785,-34.4498 288.186,-40.6387 293.922,-44.6505\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育&#45;&gt;工资 -->\r\n",
       "<g id=\"edge8\" class=\"edge\"><title>教育&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M308,-71.6966C308,-63.9827 308,-54.7125 308,-46.1124\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"311.5,-46.1043 308,-36.1043 304.5,-46.1044 311.5,-46.1043\"/>\r\n",
       "</g>\r\n",
       "<!-- SAT -->\r\n",
       "<g id=\"node9\" class=\"node\"><title>SAT</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"317\" cy=\"-162\" rx=\"28.6953\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"317\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">SAT</text>\r\n",
       "</g>\r\n",
       "<!-- SAT&#45;&gt;教育 -->\r\n",
       "<g id=\"edge9\" class=\"edge\"><title>SAT&#45;&gt;教育</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M314.775,-143.697C313.783,-135.983 312.592,-126.712 311.486,-118.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"314.946,-117.576 310.199,-108.104 308.003,-118.469 314.946,-117.576\"/>\r\n",
       "</g>\r\n",
       "<!-- 智力 -->\r\n",
       "<g id=\"node10\" class=\"node\"><title>智力</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"345\" cy=\"-234\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"345\" y=\"-230.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">智力</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;工资 -->\r\n",
       "<g id=\"edge12\" class=\"edge\"><title>智力&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M349.059,-215.929C351.306,-205.598 353.867,-192.12 355,-180 359.492,-131.961 361.633,-116.911 344,-72 339.914,-61.5944 333.489,-51.1906 327.189,-42.3908\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"329.829,-40.0774 321.024,-34.1851 324.233,-44.2823 329.829,-40.0774\"/>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;SAT -->\r\n",
       "<g id=\"edge11\" class=\"edge\"><title>智力&#45;&gt;SAT</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M338.364,-216.411C335.134,-208.335 331.172,-198.431 327.542,-189.355\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"330.746,-187.941 323.782,-179.956 324.247,-190.541 330.746,-187.941\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea1397188>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"X1\", \"T\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"X2\", \"T\")\n",
    "g.edge(\"X1\", \"Y\")\n",
    "g.edge(\"U\", \"X2\")\n",
    "g.edge(\"U\", \"Y\")\n",
    "\n",
    "g.edge(\"家庭收入\", \"教育\")\n",
    "g.edge(\"教育\", \"工资\")\n",
    "g.edge(\"SAT\", \"教育\")\n",
    "g.edge(\"家庭收入\", \"工资\")\n",
    "g.edge(\"智力\", \"SAT\")\n",
    "g.edge(\"智力\", \"工资\")\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在下图中，对 X1 和 X2 或 SAT 和家庭收入的调节足以关闭处理和结果之间的所有后门路径。换句话说，\\\\((Y_0, Y_1) \\perp T | X1, X2\\\\)。因此，即使我们无法测量所有常见原因，如果我们控制可测量的变量，这些变量可以调节未测量对治疗的影响，我们仍然可以获得条件独立性。快速说明一下，我们也有 \\\\((Y_0, Y_1) \\perp T | X1, U\\\\)，但是由于我们无法观察 U，因此我们不能以它为条件。\n",
    "\n",
    "但如果情况并非如此呢？如果未测量的变量直接导致治疗和结果怎么办？在下面的例子中，智力直接导致教育和收入。因此，治疗教育与结果工资之间的关系存在混淆。在这种情况下，我们无法控制混杂因素，因为它是不可测量的。但是，我们还有其他测量变量可以作为混杂因素的代理。这些变量不在后门路径中，但控制它们将有助于降低偏差（但不会消除偏差）。这些变量有时被称为替代混杂因素。\n",
    "\n",
    "在我们的例子中，我们不能测量智力，但我们可以测量它的一些原因，比如父亲和母亲的教育，以及它的一些影响，比如 IQ 或 SAT 分数。控制这些替代变量不足以消除偏差，但它确实有帮助。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"441pt\" height=\"260pt\"\r\n",
       " viewBox=\"0.00 0.00 440.89 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-256 436.894,-256 436.894,4 -4,4\"/>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"54\" cy=\"-234\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"54\" y=\"-230.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- U -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>U</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"54\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"54\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">U</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;U -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>X&#45;&gt;U</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54,-215.697C54,-207.983 54,-198.712 54,-190.112\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"57.5001,-190.104 54,-180.104 50.5001,-190.104 57.5001,-190.104\"/>\r\n",
       "</g>\r\n",
       "<!-- T -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>T</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">T</text>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;T -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>U&#45;&gt;T</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M47.6014,-144.411C44.4864,-136.335 40.6663,-126.431 37.1654,-117.355\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"40.4045,-116.027 33.5403,-107.956 33.8735,-118.546 40.4045,-116.027\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"55\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- U&#45;&gt;Y -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>U&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M57.6538,-143.908C59.6758,-133.569 61.9808,-120.09 63,-108 64.3441,-92.0566 64.1974,-87.9551 63,-72 62.3634,-63.5179 61.1616,-54.3361 59.8792,-46.0356\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"63.3237,-45.412 58.2483,-36.1119 56.4164,-46.5472 63.3237,-45.412\"/>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;Y -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>T&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M33.6356,-72.411C36.9134,-64.2164 40.9442,-54.1395 44.6181,-44.9548\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"47.9477,-46.0546 48.412,-35.4699 41.4484,-43.4548 47.9477,-46.0546\"/>\r\n",
       "</g>\r\n",
       "<!-- 智力 -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>智力</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"248\" cy=\"-162\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"248\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">智力</text>\r\n",
       "</g>\r\n",
       "<!-- IQ -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>IQ</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"137\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">IQ</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;IQ -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>智力&#45;&gt;IQ</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M228.918,-148.967C211.287,-137.848 184.877,-121.193 164.884,-108.585\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"166.53,-105.484 156.204,-103.111 162.796,-111.405 166.53,-105.484\"/>\r\n",
       "</g>\r\n",
       "<!-- SAT -->\r\n",
       "<g id=\"node7\" class=\"node\"><title>SAT</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"211\" cy=\"-90\" rx=\"28.6953\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"211\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">SAT</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;SAT -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>智力&#45;&gt;SAT</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M239.418,-144.765C234.999,-136.404 229.508,-126.016 224.535,-116.606\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"227.559,-114.839 219.792,-107.633 221.371,-118.11 227.559,-114.839\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育 -->\r\n",
       "<g id=\"node10\" class=\"node\"><title>教育</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"285\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"285\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;教育 -->\r\n",
       "<g id=\"edge9\" class=\"edge\"><title>智力&#45;&gt;教育</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M256.582,-144.765C261.065,-136.283 266.651,-125.714 271.681,-116.197\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"274.886,-117.624 276.465,-107.147 268.698,-114.353 274.886,-117.624\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资 -->\r\n",
       "<g id=\"node11\" class=\"node\"><title>工资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"312\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"312\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">工资</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;工资 -->\r\n",
       "<g id=\"edge11\" class=\"edge\"><title>智力&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M270.394,-151.687C287.719,-143.138 310.473,-128.651 321,-108 330.878,-88.6205 327.253,-63.8056 321.944,-45.304\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"325.27,-44.2127 318.896,-35.7494 318.601,-46.3401 325.27,-44.2127\"/>\r\n",
       "</g>\r\n",
       "<!-- 父亲受教育水平 -->\r\n",
       "<g id=\"node8\" class=\"node\"><title>父亲受教育水平</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"208\" cy=\"-234\" rx=\"68.7879\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"208\" y=\"-230.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">父亲受教育水平</text>\r\n",
       "</g>\r\n",
       "<!-- 父亲受教育水平&#45;&gt;智力 -->\r\n",
       "<g id=\"edge7\" class=\"edge\"><title>父亲受教育水平&#45;&gt;智力</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M217.683,-216.055C222.469,-207.679 228.341,-197.404 233.638,-188.134\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"236.759,-189.726 238.682,-179.307 230.681,-186.253 236.759,-189.726\"/>\r\n",
       "</g>\r\n",
       "<!-- 母亲受教育水平 -->\r\n",
       "<g id=\"node9\" class=\"node\"><title>母亲受教育水平</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"364\" cy=\"-234\" rx=\"68.7879\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"364\" y=\"-230.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">母亲受教育水平</text>\r\n",
       "</g>\r\n",
       "<!-- 母亲受教育水平&#45;&gt;智力 -->\r\n",
       "<g id=\"edge8\" class=\"edge\"><title>母亲受教育水平&#45;&gt;智力</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M337.967,-217.291C319.598,-206.206 295.007,-191.366 276.177,-180.003\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"277.819,-176.906 267.448,-174.736 274.202,-182.9 277.819,-176.906\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育&#45;&gt;工资 -->\r\n",
       "<g id=\"edge10\" class=\"edge\"><title>教育&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M291.399,-72.411C294.514,-64.3352 298.334,-54.4312 301.835,-45.3547\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"305.126,-46.5458 305.46,-35.9562 298.595,-44.0267 305.126,-46.5458\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea1397a88>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"X\", \"U\")\n",
    "g.edge(\"U\", \"T\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"U\", \"Y\")\n",
    "\n",
    "g.edge(\"智力\", \"IQ\")\n",
    "g.edge(\"智力\", \"SAT\")\n",
    "g.edge(\"父亲受教育水平\", \"智力\")\n",
    "g.edge(\"母亲受教育水平\", \"智力\")\n",
    "\n",
    "g.edge(\"智力\", \"教育\")\n",
    "g.edge(\"教育\", \"工资\")\n",
    "g.edge(\"智力\", \"工资\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 选择偏差\n",
    "\n",
    "您可能认为将所有可以测度的信息添加到模型中是一个好主意，以确保没有混淆偏差。嗯，再想想。\n",
    "\n",
    "![image.png](./data/img/causal-graph/selection_bias.png)\n",
    "\n",
    "偏差的第二大来源是我们所说的选择偏差。如果当我们没有控制一个共同的原因时会发生混淆偏差，那么选择偏差则与最终效果更相关。这里要提醒一句，经济学家倾向于将各种偏见称为选择偏差。在这里，我认为区分它和混淆偏见是很有帮助的，所以我会坚持下去。\n",
    "\n",
    "通常情况下，当我们控制的变量比我们应该控制的多时，就会出现选择偏差。可能的情况是干预和潜在结果是略微独立的，但一旦我们以冲突因子为条件就变得相互依赖了。\n",
    "\n",
    "想象一下，在某个奇迹的帮助下，您终于能够随机化教育以衡量其对工资的影响。但是为了确保您不会混淆，您可以控制很多变量。其中，您控制投资。但投资并不是教育和工资的共同原因。相反，它是两者的结果。受过更多教育的人既赚得更多，投资也更多。此外，那些赚得更多的人投资更多。由于投资是一个冲突因子，通过对它进行调节，您在处理和结果之间开辟了第二条路径，这将使衡量直接影响变得更加困难。一种思考方式是，通过控制投资，您可以查看投资相同的小部分人群，然后发现教育对这些群体的影响。但是通过这样做，您也间接地和无意地不允许工资发生太大变化。结果，您将无法看到教育如何改变工资，因为您不允许工资按应有的方式变化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"225pt\" height=\"188pt\"\r\n",
       " viewBox=\"0.00 0.00 225.30 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 221.298,-184 221.298,4 -4,4\"/>\r\n",
       "<!-- T -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>T</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"45\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"45\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">T</text>\r\n",
       "</g>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;X -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>T&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M37.7071,-144.179C33.6683,-133.937 29.0588,-120.464 27,-108 23.6218,-87.5481 23.7843,-64.09 24.7042,-46.3932\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"28.2108,-46.3985 25.35,-36.1972 21.2248,-45.956 28.2108,-46.3985\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"63\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"63\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;Y -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>T&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M49.3573,-144.055C51.3914,-136.145 53.8612,-126.54 56.1374,-117.688\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"59.5821,-118.346 58.6828,-107.789 52.8026,-116.602 59.5821,-118.346\"/>\r\n",
       "</g>\r\n",
       "<!-- Y&#45;&gt;X -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>Y&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.6504,-72.7646C50.2885,-64.2831 44.8531,-53.7144 39.9587,-44.1974\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"42.9904,-42.4395 35.3043,-35.1473 36.7654,-45.6409 42.9904,-42.4395\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育水平 -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>教育水平</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"153\" cy=\"-162\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"153\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育水平</text>\r\n",
       "</g>\r\n",
       "<!-- 投资 -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>投资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"162\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"162\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">投资</text>\r\n",
       "</g>\r\n",
       "<!-- 教育水平&#45;&gt;投资 -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>教育水平&#45;&gt;投资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M152.491,-143.906C152.121,-125.964 151.986,-96.9517 154,-72 154.684,-63.5216 155.9,-54.3409 157.18,-46.0402\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"160.643,-46.5492 158.801,-36.1158 153.735,-45.4208 160.643,-46.5492\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>工资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"190\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"190\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">工资</text>\r\n",
       "</g>\r\n",
       "<!-- 教育水平&#45;&gt;工资 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>教育水平&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M161.957,-144.055C166.338,-135.767 171.702,-125.618 176.562,-116.424\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"179.802,-117.784 181.381,-107.307 173.613,-114.512 179.802,-117.784\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资&#45;&gt;投资 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>工资&#45;&gt;投资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M183.364,-72.411C180.087,-64.2164 176.056,-54.1395 172.382,-44.9548\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"175.552,-43.4548 168.588,-35.4699 169.052,-46.0546 175.552,-43.4548\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea138fc88>"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"T\", \"X\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"Y\", \"X\")\n",
    "g.node(\"Y\", \"Y\", color=\"red\")\n",
    "\n",
    "g.edge(\"教育水平\", \"投资\")\n",
    "g.edge(\"教育水平\", \"工资\")\n",
    "g.edge(\"工资\", \"投资\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了说明为什么会这样，假设投资和教育仅采用 2 个值。人们要么投资，要么不投资。他们要么受过教育，要么没受过教育。最初，当我们不控制投资时，偏差项为零：\\\\(E[Y_0|T=1] - E[Y_0|T=0] = 0\\\\) 因为教育是随机的。这意味着人们在没有接受教育 \\\\(Wage_0\\\\) 的情况下获得的工资是相同的，如果他们接受或不接受教育治疗。但是，如果我们以投资为条件会发生什么？\n",
    "\n",
    "看看那些投资的，我们可能有这样的情况\\\\(E[Y_0|T=0, I=1] > E[Y_0|T=1, I=1]\\\\)。换句话说，在那些投资的人中，那些即使没有受过教育也能做到的人，更能独立于教育来获得高收入。因此，这些人的工资 \\\\(Wage_0|T=0\\\\) 可能高于受过教育的群体在没有受过教育的情况下的工资，\\\\(Wage_0|T =1\\\\)。类似的推理可以应用于那些不投资的人，我们也可能有 \\\\(E[Y_0|T=0, I=0] > E[Y_0|T=1, I=0]\\\\) .那些即使受过教育也不投资的人，如果没有受过教育，他们的工资可能会低于那些没有投资但也没有受过教育的人。\n",
    "\n",
    "使用纯粹的图形论证，如果有人投资，知道他们受过高等教育就可以解释第二个原因，那就是工资。以投资为条件，高等教育与低工资相关，我们有一个负偏差\\\\(E[Y_0|T=0, I=i] > E[Y_0|T=1, I=i]\\\\)。\n",
    "\n",
    "顺便提一下，如果我们以共同效应的任何后代为条件，那么我们讨论的所有这些也是正确的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"90pt\" height=\"260pt\"\r\n",
       " viewBox=\"0.00 0.00 90.00 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-256 86,-256 86,4 -4,4\"/>\r\n",
       "<!-- T -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>T</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-234\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-230.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">T</text>\r\n",
       "</g>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;X -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>T&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M23.7517,-215.888C21.9542,-205.542 19.9053,-192.063 19,-180 17.8026,-164.045 17.8026,-159.955 19,-144 19.6366,-135.518 20.8384,-126.336 22.1208,-118.036\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"25.5836,-118.547 23.7517,-108.112 18.6763,-117.412 25.5836,-118.547\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"55\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"55\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;Y -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>T&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M33.6356,-216.411C36.9134,-208.216 40.9442,-198.14 44.6181,-188.955\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"47.9477,-190.055 48.412,-179.47 41.4484,-187.455 47.9477,-190.055\"/>\r\n",
       "</g>\r\n",
       "<!-- S -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>S</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">S</text>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;S -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>X&#45;&gt;S</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M27,-71.6966C27,-63.9827 27,-54.7125 27,-46.1124\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"30.5001,-46.1043 27,-36.1043 23.5001,-46.1044 30.5001,-46.1043\"/>\r\n",
       "</g>\r\n",
       "<!-- Y&#45;&gt;X -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>Y&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M48.3644,-144.411C45.0866,-136.216 41.0558,-126.14 37.3819,-116.955\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"40.5516,-115.455 33.588,-107.47 34.0523,-118.055 40.5516,-115.455\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea139c208>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"T\", \"X\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"Y\", \"X\")\n",
    "g.edge(\"X\", \"S\")\n",
    "g.node(\"S\", \"S\", color=\"red\")\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当我们以干预的中介为条件时，也会发生类似的事情。 中介是干预和结果之间的变量，其作用是调解因果关系。 例如，再次假设您能够随机化教育。 但是，为了更好确定干预因子的影响，您决定控制此人是否有白领工作。 正如以前曾经看到的一样，这种条件化会使因果效应估计产生偏差。 不过这次不是因为它用冲突因子打开了前门路径，而是因为它关闭了干预因子影响结果的通道之一。 在我们的例子中，获得一份白领工作是更多教育导致更高工资的一种方式。 通过控制它，我们关闭了这个渠道，只留下教育对工资的直接影响。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"225pt\" height=\"188pt\"\r\n",
       " viewBox=\"0.00 0.00 225.30 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 221.298,-184 221.298,4 -4,4\"/>\r\n",
       "<!-- T -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>T</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"68\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"68\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">T</text>\r\n",
       "</g>\r\n",
       "<!-- X -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>X</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"27\" cy=\"-90\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">X</text>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;X -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>T&#45;&gt;X</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M58.4907,-144.765C53.4712,-136.195 47.2036,-125.494 41.5845,-115.9\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"44.5319,-114.007 36.4577,-107.147 38.4917,-117.545 44.5319,-114.007\"/>\r\n",
       "</g>\r\n",
       "<!-- Y -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>Y</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"54\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"54\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">Y</text>\r\n",
       "</g>\r\n",
       "<!-- T&#45;&gt;Y -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>T&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M67.6077,-143.89C67.063,-125.936 65.8061,-96.9132 63,-72 62.0451,-63.5226 60.6247,-54.3421 59.193,-46.0413\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"62.6229,-45.339 57.4076,-36.1167 55.7335,-46.5785 62.6229,-45.339\"/>\r\n",
       "</g>\r\n",
       "<!-- X&#45;&gt;Y -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>X&#45;&gt;Y</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M33.3986,-72.411C36.5136,-64.3352 40.3337,-54.4312 43.8346,-45.3547\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"47.1265,-46.5458 47.4597,-35.9562 40.5955,-44.0267 47.1265,-46.5458\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育 -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>教育</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"167\" cy=\"-162\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"167\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育</text>\r\n",
       "</g>\r\n",
       "<!-- 白领工作 -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>白领工作</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"154\" cy=\"-90\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"154\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">白领工作</text>\r\n",
       "</g>\r\n",
       "<!-- 教育&#45;&gt;白领工作 -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>教育&#45;&gt;白领工作</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M163.853,-144.055C162.421,-136.346 160.691,-127.027 159.082,-118.364\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"162.475,-117.468 157.208,-108.275 155.593,-118.746 162.475,-117.468\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>工资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"190\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"190\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">工资</text>\r\n",
       "</g>\r\n",
       "<!-- 教育&#45;&gt;工资 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>教育&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M181.845,-146.78C191.042,-136.895 202.02,-122.866 207,-108 213.925,-87.3303 208.371,-62.9598 201.82,-44.9515\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"204.979,-43.4218 198.043,-35.413 198.47,-45.9992 204.979,-43.4218\"/>\r\n",
       "</g>\r\n",
       "<!-- 白领工作&#45;&gt;工资 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>白领工作&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M162.715,-72.055C166.977,-63.7665 172.197,-53.6178 176.925,-44.4241\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"180.153,-45.8006 181.614,-35.307 173.927,-42.5992 180.153,-45.8006\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea139cd08>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"T\", \"X\")\n",
    "g.edge(\"T\", \"Y\")\n",
    "g.edge(\"X\", \"Y\")\n",
    "g.node(\"X\", \"X\", color=\"red\")\n",
    "\n",
    "g.edge(\"教育\", \"白领工作\")\n",
    "g.edge(\"教育\", \"工资\")\n",
    "g.edge(\"白领工作\", \"工资\")\n",
    "\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了给出潜在的结果参数，我们知道，由于随机化，偏差为零 \\\\(E[Y_0|T=0] - E[Y_0|T=1] = 0\\\\)。然而，如果我们以白领个人为条件，我们有 \\\\(E[Y_0|T=0, WC=1] > E[Y_0|T=1, WC=1]\\\\)。那是因为那些即使没有受过教育也能找到一份白领工作的人可能比那些需要教育帮助才能得到同样工作的人更努力工作。同理，\\\\(E[Y_0|T=0, WC=0] > E[Y_0|T=1, WC=0]\\\\) 因为那些即使受过教育也没有得到白领工作的人可能比那些没有工作但也没有受过任何教育的人更努力。\n",
    "\n",
    "在我们的例子中，对中介的调节会导致负面偏见。它使教育的效果看起来比实际要低。这是因为因果效应是积极的。如果效果是负面的，则对中介的调节会产生积极的偏见。在所有情况下，这种调节会使效果看起来比实际更弱。\n",
    "\n",
    "更通俗地说，假设您必须在两个候选人之间做出选择，才能在您的公司工作。两者都有同样令人印象深刻的专业成就，但一个没有高等教育学位。你应该选择哪一个？当然，您应该选择没有受过高等教育的人，因为他设法实现了与另一个人相同的目标，但对他不利。\n",
    "\n",
    "![image.png](./data/img/causal-graph/three_bias.png)\n",
    "\n",
    "## 关键思想\n",
    "\n",
    "我们研究了图形模型作为一种语言，以更好地理解和表达因果关系的想法。我们对图上的条件独立规则做了一个快速总结。这有助于我们探索可能导致偏见的三种结构。\n",
    "\n",
    "第一个是混淆，当治疗和结果有一个我们无法解释或控制的共同原因时，就会发生这种情况。第二个是由于对共同效应进行调节而导致的选择偏差。即使治疗是随机分配的，这种过度控制也会导致偏差。第三种结构也是选择偏差的一种形式，这次是由于对中介变量的过度控制。选择偏差通常可以通过简单地什么都不做来解决，这就是它如此危险的原因。由于我们偏向于行动，因此我们倾向于将控制事物的想法视为聪明的想法，因为它们可能弊大于利。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\r\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\r\n",
       " -->\r\n",
       "<!-- Title: %3 Pages: 1 -->\r\n",
       "<svg width=\"224pt\" height=\"188pt\"\r\n",
       " viewBox=\"0.00 0.00 224.49 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\r\n",
       "<title>%3</title>\r\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 220.494,-184 220.494,4 -4,4\"/>\r\n",
       "<!-- 智力 -->\r\n",
       "<g id=\"node1\" class=\"node\"><title>智力</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"64.2976\" cy=\"-162\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"64.2976\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">智力</text>\r\n",
       "</g>\r\n",
       "<!-- 教育 -->\r\n",
       "<g id=\"node2\" class=\"node\"><title>教育</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"27.2976\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"27.2976\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;教育 -->\r\n",
       "<g id=\"edge1\" class=\"edge\"><title>智力&#45;&gt;教育</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M55.716,-144.765C51.233,-136.283 45.6466,-125.714 40.6162,-116.197\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"43.6,-114.353 35.8326,-107.147 37.4114,-117.624 43.6,-114.353\"/>\r\n",
       "</g>\r\n",
       "<!-- 工资 -->\r\n",
       "<g id=\"node3\" class=\"node\"><title>工资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"45.2976\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"45.2976\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">工资</text>\r\n",
       "</g>\r\n",
       "<!-- 智力&#45;&gt;工资 -->\r\n",
       "<g id=\"edge2\" class=\"edge\"><title>智力&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M65.6983,-143.794C66.7958,-125.761 67.5159,-96.6749 63.2976,-72 61.7842,-63.1473 59.0174,-53.7787 56.0996,-45.4207\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"59.3007,-43.9865 52.5396,-35.8294 52.7381,-46.4224 59.3007,-43.9865\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育&#45;&gt;工资 -->\r\n",
       "<g id=\"edge3\" class=\"edge\"><title>教育&#45;&gt;工资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M31.6549,-72.055C33.689,-64.1446 36.1588,-54.5398 38.435,-45.6879\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"41.8796,-46.3456 40.9804,-35.789 35.1002,-44.6023 41.8796,-46.3456\"/>\r\n",
       "</g>\r\n",
       "<!-- 教育水平 -->\r\n",
       "<g id=\"node4\" class=\"node\"><title>教育水平</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"172.298\" cy=\"-162\" rx=\"44.393\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"172.298\" y=\"-158.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">教育水平</text>\r\n",
       "</g>\r\n",
       "<!-- 收入 -->\r\n",
       "<g id=\"node5\" class=\"node\"><title>收入</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"red\" cx=\"154.298\" cy=\"-90\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"154.298\" y=\"-86.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">收入</text>\r\n",
       "</g>\r\n",
       "<!-- 教育水平&#45;&gt;收入 -->\r\n",
       "<g id=\"edge4\" class=\"edge\"><title>教育水平&#45;&gt;收入</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M167.94,-144.055C165.906,-136.145 163.436,-126.54 161.16,-117.688\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"164.495,-116.602 158.615,-107.789 157.716,-118.346 164.495,-116.602\"/>\r\n",
       "</g>\r\n",
       "<!-- 投资 -->\r\n",
       "<g id=\"node6\" class=\"node\"><title>投资</title>\r\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"181.298\" cy=\"-18\" rx=\"27.0966\" ry=\"18\"/>\r\n",
       "<text text-anchor=\"middle\" x=\"181.298\" y=\"-14.3\" font-family=\"Times New Roman,serif\" font-size=\"14.00\">投资</text>\r\n",
       "</g>\r\n",
       "<!-- 教育水平&#45;&gt;投资 -->\r\n",
       "<g id=\"edge5\" class=\"edge\"><title>教育水平&#45;&gt;投资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M179.59,-144.179C183.629,-133.937 188.239,-120.464 190.298,-108 193.714,-87.3169 191.025,-63.7011 187.72,-45.9958\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"191.07,-44.9071 185.631,-35.8145 184.212,-46.3139 191.07,-44.9071\"/>\r\n",
       "</g>\r\n",
       "<!-- 收入&#45;&gt;投资 -->\r\n",
       "<g id=\"edge6\" class=\"edge\"><title>收入&#45;&gt;投资</title>\r\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M160.696,-72.411C163.811,-64.3352 167.631,-54.4312 171.132,-45.3547\"/>\r\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"174.424,-46.5458 174.757,-35.9562 167.893,-44.0267 174.424,-46.5458\"/>\r\n",
       "</g>\r\n",
       "</g>\r\n",
       "</svg>\r\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x14ea1372b48>"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = gr.Digraph()\n",
    "g.edge(\"智力\", \"教育\")\n",
    "g.edge(\"智力\", \"工资\")\n",
    "g.edge(\"教育\", \"工资\")\n",
    "g.node('工资', '工资', color='red')\n",
    "\n",
    "\n",
    "g.edge(\"教育水平\", \"收入\")\n",
    "g.edge(\"教育水平\", \"投资\")\n",
    "g.edge(\"收入\", \"投资\")\n",
    "g.node('收入', '收入', color='red')\n",
    "\n",
    "g"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
